Educational Videos Subtitles’ Summarization Using Latent Dirichlet Allocation and Length Enhancement

نویسندگان

چکیده

Nowadays, people use online resources such as educational videos and courses. However, courses are mostly long thus, summarizing them will be valuable. The video contents (visual, audio, subtitles) could analyzed to generate textual summaries, i.e., notes. Videos’ subtitles contain significant information. Therefore, is effective concentrate on the necessary details. Most of existing studies used Term Frequency–Inverse Document Frequency (TF-IDF) Latent Semantic Analysis (LSA) models create lectures’ summaries. This study takes another approach applies Dirichlet Allocation (LDA), which proved its effectiveness in document summarization. Specifically, proposed LDA summarization model follows three phases. first phase aims prepare subtitle file for modelling by performing some preprocessing steps, removing stop words. In second phase, trained keywords list extract important sentences. Whereas third a summary generated based list. summaries were lengthy; length enhancement method has been proposed. For evaluation, authors developed manual “EDUVSUM” dataset. compared with manual-generated outlines using two methods, (i) Recall-Oriented Understudy Gisting Evaluation (ROUGE) (ii) human evaluation. performance LDA-based outperforms TF-IDF LSA. Besides reducing summaries’ length, did improve precision rates. Other domains, news videos, can apply

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Summarization via Latent Dirichlet Allocation

This paper aims to explore the possibility of using Latent Dirichlet Allocation (LDA) for multi-document comparative summarization which detects the main differences in documents. The first two sections of this paper focus on the definition of comparative summarization and a brief explanation of using the LDA topic model in this context. In the last three sections, our novel method for multi-do...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Tamil Document Summarization Using Laten Dirichlet Allocation

This paper proposes a summarization system for summarizing multiple tamil documents. This system utilizes a combination of statistical, semantic and heuristic methods to extract key sentences from multiple documents thereby eliminating redundancies, and maintaining the coherency of the selected sentences to generate the summary. In this paper, Latent Dirichlet Allocation (LDA) is used for topic...

متن کامل

TCF21 Binding Sites Characterization using Latent Dirichlet Allocation TCF21 Binding Sites Characterization using Latent Dirichlet Allocation

Transcription factors play multiple roles in cell activity and gene expression, and discovering these roles often requires experimentation in a wet lab. We hope to bypass this process computationally by using topic modeling to infer the myriad of functions of a given transcription factor. Specifically, we apply Latent Dirichlet Allocation (LDA) to all peaks derived from running ChIP-seq on TCF2...

متن کامل

Spatial Latent Dirichlet Allocation

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computers, materials & continua

سال: 2022

ISSN: ['1546-2218', '1546-2226']

DOI: https://doi.org/10.32604/cmc.2022.021780